Population genomics provide insight into ancestral relationships and diversity of the feral horses of Theodore Roosevelt National Park

Abstract Theodore Roosevelt National Park (TRNP) manages a herd of feral horses (Equus caballus) which was present on the landscape prior to the establishment of the park. The population presents a unique scenario in that it has experienced fairly intensive and well‐documented management since the park's establishment, including herd size reductions, intentional introduction of diversity, and subsequent attempts to remove introduced lineages. This provides an interesting case study on the genetic effects of diverse evolutionary forces on an isolated feral population. To explore the effects of these forces and clarify the relationship of this feral herd with other horses, we used genome‐wide markers to examine the population structure of a combined dataset containing common established breeds. Using the Illumina Equine 70k BeadChip, we sampled SNPs across the genome for 118 TRNP horses and evaluated the inbreeding coefficient f and runs of homozygosity (RoH). To identify breed relationships, we compared 23 representative TRNP samples with 792 horses from 35 different breeds using genomic population structure analyses. Mean f of TRNP horses was 0.180, while the mean f for all other breeds in the dataset was 0.116 (SD 0.079). RoH analysis indicates that the TRNP population has experienced recent inbreeding in a timeframe consistent with their management. With Bayesian clustering, PCA, and maximum likelihood phylogeny, TRNP horses show genetic differentiation from other breeds, likely due to isolation, historical population bottlenecks, and genetic drift. However, maximum likelihood phylogeny places them with moderate confidence (76.8%) among draft breeds, which is consistent with the known history of breeds used on early North Dakota ranches and stallions subsequently introduced to the park herd. These findings will help resolve speculation about the origins of the herd and inform management decisions for the TRNP herd.


| INTRODUC TI ON
Feral or free-roaming horses (Equus caballus) can be found in many locations within the United States and in many other countries around the world, some with more recent or well-documented origins than others.These populations are often isolated to varying extents and usually require some form of management.There is strong public interest in feral horses on public lands and concern about the genetic health of small populations.Here, we present a case study of a feral horse population which has undergone fairly intensive and well-documented management since the 1940s and 1950s.This creates a unique study system to examine the effects of diverse evolutionary forces interacting with management actions upon the population genetics of a herd.There is also increasing public interest in the origin of this population, and a need for evidence-based decisions when considering management strategies.

| TRNP herd history
Free roaming horses existed in the badlands of southwestern North Dakota when Theodore Roosevelt National Park (TRNP) was established in 1947.At that time, many of the horses roaming within the boundary of the park were owned and branded by local ranchers.
The horses had either escaped or were released to forage and reproduce on their own, so that ranchers could recapture the horses and their offspring for later use (McLaughlin, 1989).In 1954, the park began work on the task of erecting a perimeter fence for the reintroduction of bison (Bison bison); an effort was made to round up the horses, which at that time were considered trespass livestock, to return them to their owners.Approximately 125 horses and mules were captured of the estimated 200 head present within the park boundary, 99% of which bore brands as evidence of ownership (McLaughlin, 1989).Over the next decade there were unsuccessful efforts to remove all remaining horses; in the 1970s, park administration decided to maintain the horses as a "historic livestock display" or "living history demonstration", reminiscent of the freeroaming livestock that Theodore Roosevelt documented during his residency (Harmon, 1986).
Reports vary as to the number of horses left in TRNP after these removal attempts, with some suggesting that the remaining founder individuals were one gray stallion and two mares.The consensus among reports, however, indicates that there were only about 16 individuals present in 1965 (Harmon, 1986;McLaughlin, 1989).Every few years thereafter the park conducted roundups to control population size by removing a portion of the herd.Population size was initially selected as 35-60 individuals (National Park Service, 1978).
A habitat use and forage analysis later recommended a population maximum of 90 individuals to prevent overgrazing of some forage species (Marlow et al., 1992).More recently, a population objective of 70-140 animals was suggested following a genetic analysis which found low effective population size (Cothran, 1992).Ten roundups were conducted from 1978 to 2013; each time the population was reduced by an average of 52% (TRNP records).Thus, the TRNP horse population has undergone 11 potential population bottleneck events.The herd has mostly been a closed population since the park perimeter was fenced.A few animals have likely entered the population over the years by mistake or intentional disposal by private persons, though all known trespass horses of the past 20 years have been removed (B. McCann, personal communications).In 1981 and1982, several established TRNP stallions were removed as part of an attempt to augment the herd by introduction of "well-bred" stallions.Six stallions were introduced at this time, including an Arabian, a Shire-Paint cross, a Quarter Horse, and three feral stallions from a Wyoming herd.Each had varying levels of reproductive success within the population (McLaughlin, 1989).The Shire-Paint cross stallion was reported to be highly successful: he maintained a large band of mares for almost a decade and was considered the most dominant stallion in the park.An estimated 15% of the population could be traced to this single stallion in 1991 (Cothran, 1992).Introduction of stallions was discontinued in favor of maintaining the historic type; in 1991 and 1997 attempts were made to remove the introduced stallions and some of their known offspring (TRNP records).
An oral history of the herd collected from TRNP employees and local ranchers in 1989 suggested that some of the horses which eluded capture originally were descended from "Indian type" horses of Spanish descent (McLaughlin, 1989).The horses used for ranch work in late nineteenth century North Dakota some 70 years prior to the park's establishment were often "Indian type" horses crossbred with other European breeds from the eastern US, Texas, Colorado, and Idaho (Crawford, 1931;Huidekoper, 1947;McLaughlin, 1989).Some have suggested that the TRNP herd is a unique population due to this potential association with the historical Spanish-type horse; however, this assertion has not been substantiated with genetic evidence.A phenotypic evaluation based on physical conformation and coat colors found evidence of crossbreeding in all TRNP horses and the presence of Spanish-type features, but recommended genetic evaluation for further resolution (Sponenberg, 1994).
Previous studies have estimated genetic diversity for the TRNP horses.Seven red blood cell antigen loci were tested in the 1990s to calculate genetic variability measures for the herd.These values were compared over a nine-year period from 1991 to 2000, as well as compared to average values for domestic horse breeds and other feral horse populations.There were decreasing values for expected heterozygosity (H e ) over the sampling period, with the final value falling below the average for both domestic and feral horses (H e = 0.327, compared to 0.363 and 0.349 for domestic and feral populations, respectively).The report also found a lower effective number of alleles (A e ) than the average for both domestic and feral populations (Cothran, 2000).All reported allele variants were previously described in domestic breeds; no unique alleles were found in the TRNP population (Cothran, 1992).A later analysis of mitochondrial DNA (mtDNA) and 12 short tandem repeat (STR) loci from the present-day TRNP herd was conducted by Ovchinnikov et al. (2018).
Values of H e and A e found for the TRNP herd were again lower than the average for other feral herds and most domestic breeds.Three mtDNA haplotypes were found and fully sequenced, none of which were exact genome matches to any published sequences in GenBank.Two of the mitochondrial genomes belonged to the same haplogroup and were most similar to an American Paint Horse sequence; the other belonged to a second haplogroup and had no close match to published sequences.However, the control region sequences of both haplogroups had matches to a wide variety of breeds with a global distribution.This suggests that at least two different populations or maternal sources contributed to the genetic diversity of the TRNP herd.The STR analysis was inconclusive in determining the ancestry of the park horses and showed TRNP horses as distinct from other breeds (Ovchinnikov et al., 2018).
Genomic approaches can be used to gain insights into population structure, relatedness, and genetic diversity for TRNP horses.
The development of a horse reference genome allows for a genomewide analysis, using tens of thousands of single nucleotide polymorphisms (SNPs) (Wade et al., 2009).Genome-wide SNPs have been used in many horse breeds to measure genetic diversity, identify regions of diversifying selection, and make inferences about the origins of breeds (Cosgrove et al., 2020;Gurgul et al., 2019;Petersen, Mickelson, Cothran, et al., 2013).Prior studies of domestic breeds and feral herds suggest that population genetics can be a useful tool for investigating breed associations and the potential origins of feral herds such as the horses of TRNP.
Here, we use genome-wide SNP genotypes to examine the relationship of the TRNP horses to established horse breeds.Based on the known history of the herd, we hypothesized that TRNP horses would be most genetically similar to common ranch horses in the USA today (Quarter Horses and American Paint Horses), Shires, or Spanish-type breeds, but that they would primarily appear as a distinct population apart from any one breed.We also hypothesized that confinement and management actions have contributed to recent loss in genetic diversity within the TRNP herd.We found that TRNP horses are distinct from other breeds, likely due to isolation, population bottlenecks occurring as part of herd management, and genetic drift.For similar breeds, we found that these horses are most closely allied with draft breeds, particularly the Shire, which were historically associated with surrounding ranches and documented stallion introductions in the park, but not notably related to Quarter Horses, Paints, or Spanish-type breeds.

| Sample collection
Hair samples were collected during a regularly-scheduled roundup in 2013.From these, we selected samples of 87 horses which had been re-released into the park, including almost all adult mares and reproductive band stallions.In 2017, additional tissue samples were collected as part of management activities via biopsy dart from 12 individuals that had evaded the 2013 capture.With the addition of these individuals our sample set represented an approximate census of the herd as it was at the end of 2013 (99/107 = 92.5%).In 2020 we collected an additional 18 tissue samples from young mares born post-roundup by biopsy dart.Our full dataset had 118 samples (91 mares, 27 stallions), including a sample from one more young mare.This includes 85% (94/111) of the adult individuals in the herd as of spring 2022 and represents 98% (177/181) of the herd when including offspring of sampled individuals.

| DNA extraction, genotyping, and sample selection
Genomic DNA was extracted from hair follicle and tissue samples by the Animal Genetics Laboratory at Texas A&M University using Gentra Puregene Tissue kits (Qiagen) following manufacturer's protocols.Individuals were then genotyped at Neogen Genomics Laboratory (Lincoln, NE) for over 70k SNPs located evenly across the horse genome using the Illumina Equine GGP 70 k BeadChip.
We combined the resulting genotypes with a dataset of 792 horses from 35 different breeds which had been genotyped using the Illumina Equine GGP 50k BeadChip (Petersen, Mickelson, Rendahl, et al., 2013).Mean sample size for breeds in that dataset was 22.63 individuals (Table A1); to prevent a comparatively large number of TRNP samples from skewing the principal components analysis (PCA), we selected a representative subset of 23 TRNP individuals for our final dataset.Using pedigree data determined by genetic testing and family band association records going back to the 1980s, we identified first degree relatives and excluded the younger individual (i.e., offspring) to reduce bias due to relatedness, which left 32 individuals.We then performed a random selection to reduce the dataset to the final 23 individuals, and manually checked to confirm that this subset contained individuals from both geographic regions within the park.

| Data pruning
We performed quality control filtering using SNP & Variation Suite v8.9.0 (SVS) (Golden Helix, Inc., Bozeman, MT, www.golde nhelix.com), with the methods used by Petersen, Mickelson, Cothran, et al. (2013).We first removed markers with call rate ≤0.95, and then samples with call rate ≤0.95.All 23 TRNP samples were above the threshold and were retained.This eliminated any markers which were not included in both the 70k and 50k genotyping arrays.We next removed SNPs with a minor allele frequency (MAF) of 0.05 or less.We mapped the remaining SNPs to EquCab3.0 (www.ncbi.nlm.nih.gov/ assem bly/ GCF_ 00286 3925.1/ ) using SVS and filtered to include only autosomal loci.This resulted in a final set of 815 samples and 38,786 SNPs.We further filtered the dataset for linkage disequilibrium (LD) using a window size of 50 and an increment of 5, with an LD threshold of r 2 = .5.After LD filtering, this second version of the dataset retained 815 samples and 28,505 SNPs.

| Among-breed relationships
To assess the current relationship of the TRNP horses to other breeds, we conducted a Principal Components Analysis (PCA) on the dataset pruned for MAF and call rate, as per Petersen, Mickelson, Cothran, et al. (2013).We computed the principal components in SVS using an additive model with the option selected to normalize each marker's data by its standard deviation.We then plotted the first three principal components against each other to visualize relationships among individuals of different breeds and the TRNP horses.We also calculated pairwise values of Wright's fixation index (F ST ) between all breeds and TRNP horses in SVS using the LD pruned dataset.

| Phylogenetic analysis
We conducted a phylogenetic analysis to depict evolutionary relationships among the populations.We converted the LD pruned dataset to Phylip format using vcf2phylip (Ortiz, 2019).
We included all breed samples from the Petersen, Mickelson, Cothran, et al., 2013 dataset along with the same 23 TRNP samples used for the PCA and F ST analyses.All phylogenetic analyses were performed via the CIPRES Science Gateway v3.3 (Miller et al., 2010).We converted the SNP dataset to a FASTA file using NCL converter v2.1, which maintained the original nucleotide information (Lewis, 2003).The final matrix included 815 individuals sampled for 28,501 aligned nucleotides.We aligned the resulting data using ClustalW v2.1 with standard parameters (Thompson et al., 1994), and used RAxML v8.2.12 (Stamatakis, 2014) to construct the phylogeny for these samples, with 1000 bootstrap replicates.Bootstrap values were calculated using a majority rule consensus tree with Consense (Felsenstein 1986(Felsenstein -2008)).Only bootstrap values of ≥70% are reported.We visualized the resulting phylogeny and edited the appearance of the tree using FigTree v1.4.4 (http:// tree.bio.ed.ac.uk/ sof t w are/ figtr ee/ ).

| Bayesian cluster analysis
To estimate ancestral clusters and common heritage, we used the program ADMIXTURE on the LD pruned dataset (Alexander et al., 2009).ADMIXTURE employs model-based ancestry estimation to assign individuals to clusters with similar ancestry based on genotypes.For each individual sample ADMIXTURE returns the proportion of their genome that can be assigned to each ancestral cluster based on allele frequencies.Since there were 36 breeds including the TRNP samples in the dataset, we ran the program for K = 1 through K = 36, K being the assumed number of ancestral populations.To determine the value of K that created the most accurate model to describe the dataset we followed ADMIXTURE instructions and compared the program's values for cross-validation (CV) error at each value of K, using the default setting of 5-fold CV and selecting the lowest resulting value.

| Estimates of inbreeding
To assess inbreeding and the effects of historical population management, we conducted two measures of estimating inbreeding levels.We calculated the individual inbreeding coefficient (f) for all 118 TRNP samples and each breed sample in the LD pruned dataset using SVS.Another approach to evaluating inbreeding or relatedness by descent is to identify runs of homozygosity (ROH) within the genome.These homozygous-by-descent (HBD) segments are created when an individual inherits two copies of the same stretch of chromosome from a common ancestor (Ceballos et al., 2018;Peripolli et al., 2016).We used the R package RZooRoH to identify ROHs and model the generational age of common ancestors based on segment length (Bertrand et al., 2019).RZooRoH uses hidden Markov models to relate the length of HBD segments to the age of the segments, as a more recent common ancestor will have had fewer opportunities for recombination of the homozygous segment.To evaluate the state of the TRNP herd we included all 118 TRNP samples, did not prune for LD, and used the RZooRoH default model.We used Vortex10 (Lacy & Pollak, 2021) to calculate the generation time of the TRNP herd using historical records of demographic survival and mortality rates.The resulting generation time of 10.48 years closely matched the 10 years previously reported for feral horses (National Research Council, 2013).
We also selected several other breeds with known population history for comparison.

| TRNP population structure
To examine subpopulation structure within the TRNP herd, we prepared a dataset of all 118 samples pruned by marker call rate ≤0.95, sample call rate ≤0.95,MAF ≤0.05, and autosomes only.We then conducted a PCA of this dataset in SVS as above.We assigned sam-

| Among-breed relationships
The first three principal components from our PCA account for 62.19% of total variation in the dataset.The first principal component (PC1) explains 36.90% of the variance in the dataset.The second principal component (PC2) explains 14.06% of the variance, with a more gradual decline in explanatory value from PC2 to PC10 (Table A2).On a plot of PC1 by PC2, the breeds follow the pattern described by Petersen, Mickelson, Cothran, et al. (2013).TRNP horses fall near the center of the plot (Figure 1).The TRNP cluster overlaps on PC1 with such breeds as Morgan, Lusitano, Mangalarga Paulista, Andalusian, Franches-Montagnes, New Forest Pony, Peruvian Paso, Tuva, Caspian, and Puerto Rican Paso Fino.On PC2, the TRNP horses are separated from these breeds in the direction of the draft horses such as Shire, Clydesdale, and Fell Pony.When looking at PC3 on the plot of PC1 by PC3 there is more overlap of the TRNP horses with the Tuva, New Forest Pony, and Caspian, as well as with the Akhal-Teke and French Trotter (Figure 2).Mean pairwise F ST value among breeds was 0.108, with a minimum value of 0.002 between Paint and Quarter Horse and a maximum value of 0.273 between Clydesdale and Mangalarga Paulista (Table A3).F ST values between the TRNP horses and other breeds

| Phylogenetic results
The results of the maximum likelihood analysis revealed a star phylogeny pattern with short internal branches and long external branches, typical of a lineage which has undergone rapid differentiation (Figure 3).Samples with the same breed assignment were placed mostly into their own respective clades.External nodes had high confidence bootstrap values, but many of the basal branches joining breeds were not well supported (<70% bootstrap values).TRNP horses were found to be monophyletic with strong support (100% bootstrap support) and were placed with moderate support (76.8% bootstrap support) with the Shire, Clydesdale, and Fell Pony, among other draft-type breeds such as the Percheron, Belgian, and Franches-Montagnes.The next nearest branches included the ponytype breeds (e.g., Finnhorse, Miniature, Shetland); this clade had higher support (81% bootstrap support).Spanish-type breeds (e.g., Andalusian, Peruvian Paso, Mangalarga Paulista) comprise a clade that is separate from TRNP horses with moderate support (73.3%).

| Bayesian clustering analysis
The lowest CV error returned by ADMIXTURE was observed at In the K = 21-28 range of low CV errors the TRNP horses clustered similarly to the K = 25 results (Figure A3).From K = 20 through K = 8 the TRNP horses were still assigned to their own cluster.At K = 7, the TRNP horse genomes were assigned by 0.621-0.850to a cluster that included high proportions for other individuals of many different breeds of draft type, such as Belgian, Percheron, Franches-Montagnes, and North Swedish Horse (Figure A2).At such a low number of ancestral populations most of the clusters separated into general groups of draft breeds (the Shire and Clydesdale formed their own cluster), pony breeds, Spanish, and Arabian breeds, or warmblood breeds, reflective of PCA and phylogenetic relationships.
At K = 28, the TRNP horses were split into two clusters (Figure A4).The division was aligned with the geographical categories assigned to TRNP individuals for population structure analysis.
Again, no individual from any other breed had more than a trace assignment to either TRNP cluster, and only three TRNP individuals had genome proportions between 0.0002 and 0.056 assigned to any other clusters (Figure A5).

| Estimates of inbreeding
TRNP horses had relatively high values for inbreeding coefficients compared to other breeds.The mean f of TRNP horses was 0.180, while the mean f for all other breeds in the dataset was 0.116 (standard deviation of 0.079).Only seven breeds had a mean f higher than the TRNP horses (Tables A4 and A5).proportion of the genome covered by HBD segments for TRNP individuals was 0.22 (SD 0.066; range 0.095-0.47;median 0.21; IQR 0.17-0.25)(Figure 5).The highest proportions were in generation classes 4 and 8 (Figures 5 and 6

| TRNP population structure
On the plot of PC1 by PC2 for the 118 TRNP samples, individuals are noticeably sorted by geographic category along PC1 (Figure 7).PC1 captures 12.32% of the genetic variation in the dataset, and PC2 captures 5.27%.The individuals located South of Paddock F I G U R E 3 Maximum likelihood tree with bootstrap values for horse breeds, including TRNP horses.Only bootstrap values with confidence of 70% and higher are given.TRNP horses were found to be monophyletic and were placed among draft breeds with 76.8% confidence, diverging from the branch that contains Shires, Clydesdales, and Fell Ponies.

| DISCUSS ION
By using multiple approaches to analyze the population genetics of TRNP horses, we identified overall patterns that reflect the history of this herd.These analyses place the TRNP horses well within the diversity seen in modern domestic horses, but do not show a strong signal of relatedness to any one breed, consistent with previous work and the evolutionarily recent development of most horse breeds.Based on park records, it is known that the TRNP herd has had genetic influxes from multiple sources.Admixture between breeds and the recent isolation of horse breeds makes it challenging to determine ancestry in the more distant past.As these breeds have experienced continued artificial selection for certain characteristics, they differ from horse populations that existed in the late 1800s.
Admixture can create new allele combinations and frequencies, contributing to the apparent population differentiation.However, some patterns emerge across these analyses.
The PCA plot reflects variation in genotypes and identifies unique populations and associations with phenotypic traits for the sampled breeds.Highly specialized breeds can be found as distinct from other breeds due to strong selection pressures and inbreeding, and the plot can also show evidence of admixture between populations (Petersen, Mickelson, Cothran, et al., 2013).The TRNP points are not as tightly clustered together as some of the other breeds, indicating admixture between multiple sources in their recent history (McVean, 2009).Some TRNP points are separated from the center with Spanish-type breeds on PC1 as well as with such breeds as Franches-Montagnes, Morgan, and Tuva, but are separated from them on PC2, potentially indicating some affinity between the herd and these breeds.
The general distribution of domestic breeds in PCA is consistent across multiple reports (Funk et al., 2020;Ovchinnikov et al., 2018).
Though direct comparisons cannot be made across separate analyses, there are some interesting similarities between our results and the results of another study of two feral Canadian populations evaluated with the same Petersen, Mickelson, Rendahl, et al. (2013) dataset.A large population of feral horses ("Alberta Foothills") that has ranged in size from 1000 to 1700 individuals and likely experienced continual gene inflow from multiple draft breeds and Quarter Horses/Paints appeared in a generally similar position to the TRNP horses in a PCA plot (Tollett, 2018).A second, isolated feral population of about 500 individuals ("Sable Island") was more tightly clustered on PC1 and PC2 but was noticeably separated from the main cloud of points on PC3 (Tollett, 2018).The TRNP herd, though considerably smaller than the Sable Island herd, does not show such divergence on PC3.
The maximum likelihood tree reflects the pattern seen in the PCA plot, with Thoroughbreds located in one portion of the topology while draft horses and ponies are found on the other side of the tree.The TRNP horses are found to be more similar to draft breeds and particularly the Shires, Clydesdales, and Fell Ponies, supporting the idea that the herd retains influence from the introduced Shire-Paint stallion.If this influence does come substantially from that individual, the more recent timeframe of his introduction in comparison to the development of most breeds would contribute to the lower value of bootstrap confidence observed.Due to the local popularity of Percheron horses in the late 1800s, though, the genetic contribution of draft breeds may also have been present before his introduction (Crawford, 1931;Huidekoper, 1947;McLaughlin, 1989).However, there is support for separation of the Spanish breeds from the TRNP horses, suggesting that these Spanish breeds have had limited genetic influence on TRNP horses in recent history compared with draft breeds.
During domestication, artificial selection for specialized traits, along with transport and husbandry of horses, resulted in breeds or types within a short evolutionary timeframe.Our analyses indicate that horse breed relationships are reconstructed in a star-like phylogeny, with short internal branches and long external branches, indicating rapid rates of diversification due to strong selection.
Because horse breeds successfully interbreed, it is difficult to reconstruct that genetic history with a simple bifurcating phylogeny.This difficulty is reflected in the low nodal support for internal nodes.Recombination, or gene flow between branches, can affect the shape of phylogenetic trees, lengthening the terminal branches (Li et al., 2019;Schierup & Hein, 2000).This pattern is commonly seen within domestic species, such as dogs, cattle and water buffalo, goats, and chickens (Mannen et al., 2020;Quan et al., 2020;Rout et al., 2008;Sun et al., 2020;Vonholdt et al., 2010).Other trees constructed with different data show a similar phylogenetic pattern F I G U R E 7 Principal components analysis of all 118 TRNP samples, with individuals labeled by geographic region within the South Unit of TRNP.The first two principal components explain 12.32% and 5.27% of the genetic variation, respectively.Individuals are generally sorted by geographic category along PC1.
Based on the oral history of the herd, we would expect to see some genetic similarity between the TRNP horses and one or all of the Spanish breeds.In the PCA, the TRNP horses are close to Spanish breeds on PC1 but diverge from those breeds on PC2.Relatively high F ST values indicate that there is little recent Spanish contribution to the herd.Further, the maximum likelihood tree and the ADMIXTURE analysis separate the Spanish breeds from the TRNP horses.Based on these observations, it seems that the present-day herd is not closely related to the Spanish breeds.Considering the history of horses in the Americas, though, we cannot rule out previous Spanish influences.However, there were approximately 70 years of few records between the purported import of Spanish type horses into the local area in the 1880s and the 1950s when herd management began, during which local ranchers were known to cross "Indian type" horses with European breeds (McLaughlin, 1989).Thus, any Spanish lineage would most likely have experienced admixture before the founding of the park.Additionally, most horses in the park in the 1950s were branded, suggesting considerable influence of 20th century ranching practices on herd composition.
A common thread across the ADMIXTURE, phylogeny, and PCA results is that TRNP horses are a distinct population in comparison to these domestic horse breeds.Genetic differentiation can be driven by selection, mutation, reduced gene flow, genetic drift, and nonrandom mating.Although the oldest formal breed registries have only existed for approximately 200 years, horses have been under artificial selection during their domestication for at least 4000 years (Orlando, 2020).The TRNP herd has not experienced artificial selection for specific characteristics but has had limited gene flow and a small population size (80-200 individuals) since the 1950s.A few individuals have been introduced over the history of the herd; however, the last intentional introduction as part of a management decision occurred in the 1980s.Genetic differentiation likely resulted from the isolation and repeated bottleneck events experienced by the TRNP herd, resulting in genetic drift.Essentially, the TRNP horses are more similar to each other in allelic combinations than they are to any other horse breeds.Ovchinnikov et al. (2018) reported low values of genetic variability (observed heterozygosity and allelic diversity) in the TRNP herd compared to both domestic breeds and other feral herds.This is also reflected in inbreeding coefficients from genome-wide analysis of SNPs, with the TRNP horses having higher values of f than most other breeds.The F ST values between TRNP and other breeds were all near or higher than the average among all breeds.F ST measures genetic differentiation between populations using allele frequencies and can indicate reduced gene flow between those populations.
While the TRNP horses are placed among the draft breeds on the phylogenetic tree, the Shire and Clydesdale also have some of the highest values for inbreeding coefficient, f.In combination with the high mean f of the TRNP population, this resulted in high values of F ST between these breeds, despite some shared ancestry.We know that the TRNP population has been isolated for many years, and the However, HBD segments are likely overestimated in our dataset, due to the density of SNPs called in the array.Lavanchy and Goudet (2023) demonstrated that SNP density is an important metric in accurately assessing HBD segment presence and recommend using high-density (>11 SNPs/Mb) datasets.While more recent inbreeding and longer HBD segments are easier to estimate, lower SNP density can result in overestimation of small segments.Still, since the TRNP population is small and inbred, some of the unsampled portions of the genome are likely to also be homozygous.
RZooRoH produced HBD class results consistent with the known history of several other breeds, indicating that the TRNP results for HBD classes were likely reasonable.For example, Clydesdales experienced a bottleneck following agricultural mechanization and their use in WWI and WWII during the 1920s-1940s (Hendricks, 1995).RZooRoH assigned a higher proportion of their genome to 8 generations ago, consistent with this timeframe.A more recent and severe reduction in population size occurred with the Florida Cracker with only 31 individuals present in 1989 (Florida Cracker Horse Association; Conant et al., 2012), which is reflected by the highest HBD level occurring in the four generation class.These breeds and the TRNP horses have a clear signature of inbreeding compared with the Quarter Horse, a breed with very large population size and admixture from multiple sources, but which has only formally existed since 1940.In fact most horse breeds and breed registries were only established within the last few centuries, alongside an increased awareness of heritability (Hendricks, 1995).The HBD classes here extend out to hundreds of generations and thousands of years ago and suggest a level of inbreeding that may be present for the species in general which occurred around domestication events.
As we hypothesized, large scale roundup actions have likely caused genetic bottlenecks in the herd and led to inbreeding, with the initial reduction of herd size at the time of the installation of the perimeter fence likely producing the largest effect.Inbreeding can also be exacerbated by subpopulation differentiation within a population.Smaller subpopulations can experience increased effects of genetic drift and an overall loss of heterozygosity within the population as a whole.Within the TRNP herd there is evidence of some population structure, as seen in the TRNP PCA and the splitting of the TRNP ADMIXTURE cluster at K = 28 A5).Observational data shows that many of the TRNP horses have a behavioral tendency to inhabit a regular area within the park and disperse to join other family bands that also frequent the same area.Though the Paddock Creek corridor is not a restrictive physical barrier, there seems to be enough site fidelity to contribute to nonrandom mating and the development of genetic population structure.However, there is still frequent gene flow within the population, as evidenced by the mixed parentage individuals, and likely enough gene flow to prevent substantial subpopulation differentiation.

| CON CLUS IONS
The use of much larger numbers of markers in a SNP array provided a more in-depth evaluation of this feral population than had previously been possible with limited markers.While the TRNP population continued to group separately from other breeds, consistent with previous work, we did detect genetic relationships where previous work had been inconclusive.The strongest observed similarity is between TRNP horses and some draft breeds, based on phylogeny and ADMIXTURE relationships.In particular, the placement of the

| IMPLIC ATIONS
If a reproductive herd is to be maintained, approaches to reduce the effect of continued isolation on genetic diversity can be considered.Genetic diversity in a closed population may be increased with introduction of new individuals.Periodic introductions could be used to counter the effects of genetic drift.Depending on long term management objectives, individuals for introduction could be chosen from a more genetically distant population to maximize variation, or from a population with a more similar background.Analysis of additional samples from other feral herds across the country might identify more relationships, potentially suggesting another similar source from which to select animals for introduction.

| PERMIT INFORMATION
ples to geographic regions based on observational data of spatial use by each family band of horses.The South Unit of the park was partitioned into two geographic regions by an intermittent stream: the categories were "North of Paddock Creek (NoPC)", "South of Paddock Creek (SoPC)", or individuals of mixed parentage resulting from observed dispersal.
populations, though CV error values of K in the 21-28 F I G U R E 1 Principal components analysis of genetic variation among horse breeds and feral horses from TRNP, showing one plane of the cloud of points (PC1 by PC2).Points represent individuals.PC1 captures 36.90% of the variation in the dataset.PC2 captures 14.06% of the variation.TRNP horses fall into a group in the center of the plot, indicated by pink circles.range were of similarly low values (Figure A1).At K = 25 each TRNP horse was grouped by a majority of their genome into the same cluster with minimal assignment to other clusters (Figure 4).The proportion of TRNP genomes assigned to this cluster ranged from 0.703 to 0.999, with a mean of 0.952.No individuals from other breeds had notable assignment to the TRNP horse cluster.Four TRNP individuals had a proportion of 0.039-0.075 of their genomes assigned to another cluster which included high genome proportions from Quarter Horse, Paint, and Florida Cracker individuals, and a 0.021-0.033proportion assigned to a cluster including Thoroughbred, Hanoverian, Shire, Quarter Horse, Paint, Swiss Warmblood, and Maremmano individuals.Two TRNP individuals had 0.010-0.031 of their genome assigned to each of three clusters including Andalusian, Lusitano, Percheron, Shire, and Clydesdale individuals.The individuals from other breeds that had the highest proportion of assignment with the TRNP cluster (range 0.032-0.054)were a Swiss Warmblood, Hanoverian, Morgan, Saddlebred, two Paints, Maremmano, Quarter Horse, Puerto Rican Paso Fino, and a New Forest Pony.
RZooRoH identified 9195 HBD sequences among the 118 TRNP samples, on all autosomes and in all individuals.The mean F I G U R E 2 Principal components analysis of genetic variation among horse breeds and feral horses from TRNP, showing the same cloud of points from another plane (PC1 by PC3).Points represent individuals.PC3 captures 11.23% of the variation in the dataset.TRNP horses fall into a group in the center of the plot, indicated by pink circles.
).With a generation time of 10 years this corresponds to common ancestors approximately 40-80 years ago, suggesting bottlenecking or founder events around that timeframe.For comparison, Figure 6 also shows the Clydesdale and Florida Cracker, which have both undergone recent genetic bottlenecks, the Puerto Rican Paso Fino, which experienced a more distant bottleneck during the importation of Spanish horses to the Americas, and the Quarter Horse, which has multiple sources of recent admixture.

F
Ancestry estimation using ADMIXTURE modeling.The number of ancestral populations (clusters) K = 25 was chosen based on ADMIXTURE's CV error calculation.Vertical lines represent individuals, with colors representing the proportion of their genome attributed to each ancestral cluster.The TRNP horses make up their own red cluster (leftmost) with minimal shared ancestry from other clusters.Creek (SoPC) are more closely grouped than are individuals North of Paddock Creek (NoPC).Individuals of mixed parentage are between and among the two categories.On PC2, the NoPC points have a wider spread than do the SoPC points.
of the cluster, falling toward the Shires and other draft breeds.This separation may reflect the influence of the Shire-Paint stallion introduced to the park in the 1980s.On the PCA plot, the five Spanishtype breeds are all tightly clustered together.The TRNP points align F I G U R E 5 Proportion of the genome in each homozygosity by descent (HBD) class for all 118 TRNP individuals as estimated by RZooRoH.HBD classes represent inbreeding level based on number of generations removed to common ancestor, where HBD class 2 corresponds with 1 generation to common ancestor, HBD class 4 with 2 generations, etc. F I G U R E 6 Average proportion of the genome in each generation class for all 118 TRNP horses in comparison to four other breeds of known population history, as estimated by RZooRoH.Clydesdale and Florida Cracker have undergone recent bottlenecks, Puerto Rican Paso Fino an older bottleneck, and Quarter Horses have been recently admixed.HBD classes represent inbreeding level based on number of generations removed to common ancestor, with lower generation numbers corresponding to more recent inbreeding.TRNP horses have highest presence of HBD segments in classes 4 and 8 generations ago, which corresponds with the initial population bottleneck from the time of the park's establishment.
lack of recent gene flow with other breeds is supported by these high F ST values.Though F ST values are lower between TRNP and Paint and Quarter Horses than TRNP and any draft breed, other results do not single out these breeds as recent contributors to the park population.Morgan horses are another putative source of TRNP ancestry based on PCA results; Morgan horses are the oldest remaining North American horse breed, originating in the late 1700s and early 1800s for use on farms (Battell, 1894).Perhaps the ranch horses of early western North Dakota were genetically similar to the early work horses of the eastern US and to the early Quarter Horses of Texas, before their differentiation into strictly kept breeds.It is possible that these early work horses contributed to the TRNP population.The ROH analysis also shows that the TRNP horses have experienced relatively recent inbreeding.The high proportion of HBD segments in the genome due to inheritance from common ancestors four and eight generations ago coincides with the herd's isolation and known bottleneck events.The initial bottleneck occurred 60-70 years ago (6-7 generations) in the 1950s and 60s following the establishment of the park, when the majority of the horses on the land were rounded up and returned to their owners at the same time as the park perimeter was fenced and the remaining population was isolated.A reduction of the TRNP population to 16 individuals, followed by low gene flow into the population, explains the current presence of large chromosome segments inherited from common ancestors.The presence of these HBD segments and a relatively high inbreeding coefficient suggest that the bottlenecks experienced in the recent history of the TRNP herd have affected genetic diversity of the population.In the case of the individual horse with the highest presence of HBD segments (0.47), pedigree records indicate that this individual likely had the same stallion-mare mating in the P2 generation on one side and the P3 generation on the other.Chromosome 1 for this individual appeared almost entirely homozygous.
TRNP horses on a branch next to Shires, Clydesdales, and Fell Ponies in the maximum likelihood tree indicates that descendants of the introduced Shire-Paint stallion persist in the present-day herd.The inbreeding analyses indicate that the TRNP herd has experienced inbreeding and differentiation from other breeds, likely due to genetic drift, bottleneck events, and limited gene flow.Historical management actions likely exacerbated the inbreeding levels within the population, especially the original population bottleneck and initial removal attempt at the time of the perimeter fence installation in the 1950s.
2020 biopsy samples were collected under research permit number THRO-2020-SCI-0013 and NPS IACUC approval (ND_THRO_ McCann_HorseBiopsyDarting_2020.A1).How to cite this article: Thompson, M. A., McCann, B. E., Rhen, T., & Simmons, R. (2024).Population genomics provide insight into ancestral relationships and diversity of the feral horses of Theodore Roosevelt National Park.Ecology and Evolution, 14, e11197.https://doi.org/10.1002/ece3.11197F I G U R E A 1 Cross-validation error for each value of K, calculated by ADMIXTURE.The most likely value of K was chosen by the lowest error value.F I G U R E A 2 Ancestry estimation using ADMIXTURE modeling with the value of K = 7, the lowest value of K for which the TRNP horses are primarily assigned to a cluster that includes other breeds.Vertical lines represent individuals, with colors representing the proportion of their genome attributed to each ancestral cluster.The TRNP horses are assigned to a cluster that also includes individuals from many breeds of draft type, including Belgian, Percheron, Franches-Montagnes, North Swedish Horse, Norwegian Fjord, Finnhorse, Mongolian, Tuva, and New Forest Pony.The Shire and Clydesdale make up their own cluster at K = 7. F I G U R E A 3 Ancestry estimation using ADMIXTURE modeling with the value of K = 21.Vertical lines represent individuals, with colors representing the proportion of their genome attributed to each ancestral cluster.At K = 21 the CV error is only slightly higher than the lowest value of K = 25.At K = 21, the TRNP horses are still assigned to their own cluster, which no other individuals share.F I G U R E A 4 Ancestry estimation using ADMIXTURE modeling with the value of K = 28.Vertical lines represent individuals, with colors representing the proportion of their genome attributed to each ancestral cluster.At K = 28 the CV error is only slightly higher than the lowest value of K = 25.At K = 28, the TRNP horses are split into two clusters, still unshared with any other breed.F I G U R E A 5 Closer detail to highlight the TRNP results from the ADMIXTURE K = 28 model in Figure A4.The TRNP horses are split into two exclusive clusters that correspond with their geographical region (North or South of Paddock Creek) within the park.TA B L E A 1 Number of individuals of each domestic breed included in analyses.Pairwise F ST values calculated for 36 populations/breeds.Note: Rows are sorted by value in comparison to TRNP.TA B L E A 5Values of inbreeding coefficient f for all 118 TRNP samples, as well as the number of observed homozygous loci and the number of expected homozygous loci as calculated by SVS.